119 research outputs found

    Random projections as regularizers: learning a linear discriminant from fewer observations than dimensions

    Get PDF
    We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher linear discriminant classifiers, focusing on the casewhen there are fewer training observations than data dimensions. The specific form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classifier and classifier ensemble pair. Furthermore we show that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents overfitting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our ensemble is learned from a set of randomly projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical findings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm

    Sharp generalization error bounds for randomly-projected classifiers

    Get PDF
    We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomly projected data. We make no restrictive assumptions (such as sparsity or separability) on the data: Instead we use the fact that, in a classification setting, the question of interest is really ā€˜what is the effect of random projection on the predicted class labels?ā€™ and we therefore derive the exact probability of ā€˜label flippingā€™ under Gaussian random projection in order to quantify this effect precisely in our bounds

    Towards large scale continuous EDA: a random matrix theory perspective

    Get PDF
    Estimation of distribution algorithms (EDA) are a major branch of evolutionary algorithms (EA) with some unique advantages in principle. They are able to take advantage of correlation structure to drive the search more efficiently, and they are able to provide insights about the structure of the search space. However, model building in high dimensions is extremely challenging and as a result existing EDAs lose their strengths in large scale problems. Large scale continuous global optimisation is key to many real world problems of modern days. Scaling up EAs to large scale problems has become one of the biggest challenges of the field. This paper pins down some fundamental roots of the problem and makes a start at developing a new and generic framework to yield effective EDA-type algorithms for large scale continuous global optimisation problems. Our concept is to introduce an ensemble of random projections of the set of fittest search points to low dimensions as a basis for developing a new and generic divide-and-conquer methodology. This is rooted in the theory of random projections developed in theoretical computer science, and will exploit recent advances of non-asymptotic random matrix theory

    On the mechanism of response latencies in auditory nerve fibers

    Get PDF
    Despite the structural differences of the middle and inner ears, the latency pattern in auditory nerve fibers to an identical sound has been found similar across numerous species. Studies have shown the similarity in remarkable species with distinct cochleae or even without a basilar membrane. This stimulus-, neuron-, and species- independent similarity of latency cannot be simply explained by the concept of cochlear traveling waves that is generally accepted as the main cause of the neural latency pattern. An original concept of Fourier pattern is defined, intended to characterize a feature of temporal processingā€”specifically phase encodingā€”that is not readily apparent in more conventional analyses. The pattern is created by marking the first amplitude maximum for each sinusoid component of the stimulus, to encode phase information. The hypothesis is that the hearing organ serves as a running analyzer whose output reflects synchronization of auditory neural activity consistent with the Fourier pattern. A combined research of experimental, correlational and meta-analysis approaches is used to test the hypothesis. Manipulations included phase encoding and stimuli to test their effects on the predicted latency pattern. Animal studies in the literature using the same stimulus were then compared to determine the degree of relationship. The results show that each marking accounts for a large percentage of a corresponding peak latency in the peristimulus-time histogram. For each of the stimuli considered, the latency predicted by the Fourier pattern is highly correlated with the observed latency in the auditory nerve fiber of representative species. The results suggest that the hearing organ analyzes not only amplitude spectrum but also phase information in Fourier analysis, to distribute the specific spikes among auditory nerve fibers and within a single unit. This phase-encoding mechanism in Fourier analysis is proposed to be the common mechanism that, in the face of species differences in peripheral auditory hardware, accounts for the considerable similarities across species in their latency-by-frequency functions, in turn assuring optimal phase encoding across species. Also, the mechanism has the potential to improve phase encoding of cochlear implants

    Maximum margin principal components

    Get PDF
    Principal Component Analysis (PCA) is a very successful dimensionality reduction technique, widely used in predictive modeling. A key factor in its widespread use in this domain is the fact that the projection of a dataset onto its first KK principal components minimizes the sum of squared errors between the original data and the projected data over all possible rank KK projections. Thus, PCA provides optimal low-rank representations of data for least-squares linear regression under standard modeling assumptions. On the other hand, when the loss function for a prediction problem is not the least-squares error, PCA is typically a heuristic choice of dimensionality reduction -- in particular for classification problems under the zero-one loss. In this paper we target classification problems by proposing a straightforward alternative to PCA that aims to minimize the difference in margin distribution between the original and the projected data. Extensive experiments show that our simple approach typically outperforms PCA on any particular dataset, in terms of classification error, though this difference is not always statistically significant, and despite being a filter method is frequently competitive with Partial Least Squares (PLS) and Lasso on a wide range of datasets

    Structure from randomness in halfspace learning with the zero-one loss

    Get PDF
    We prove risk bounds for halfspace learning when the data dimensionality is allowed to be larger than the sample size, using a notion of compressibility by random projection. In particular, we give upper bounds for the empirical risk minimizer learned efficiently from randomly projected data, as well as uniform upper bounds in the full high-dimensional space. Our main findings are the following: i) In both settings, the obtained bounds are able to discover and take advantage of benign geometric structure, which turns out to depend on the cosine similarities between the classifier and points of the input space, and provide a new interpretation of margin distribution type arguments. ii) Furthermore our bounds allow us to draw new connections between several existing successful classification algorithms, and we also demonstrate that our theory is predictive of empirically observed performance in numerical simulations and experiments. iii) Taken together, these results suggest that the study of compressive learning can improve our understanding of which benign structural traits ā€“ if they are possessed by the data generator ā€“ make it easier to learn an effective classifier from a sample

    Maximum gradient dimensionality reduction

    Get PDF
    We propose a novel dimensionality reduction approach based on the gradient of the regression function. Our approach is conceptually similar to Principal Component Analysis, however instead of seeking a low dimensional representation of the predictors that preserve the sample variance, we project onto a basis that preserves those predictors which induce the greatest change in the response. Our approach has the benefits of being simple and easy to implement and interpret, while still remaining very competitive with sophisticated state-of-the-art approaches

    Toward Large-Scale Continuous EDA: A Random Matrix Theory Perspective

    Get PDF
    Estimations of distribution algorithms (EDAs) are a major branch of evolutionary algorithms (EA) with some unique advantages in principle. They are able to take advantage of correlation structure to drive the search more efficiently, and they are able to provide insights about the structure of the search space. However, model building in high dimensions is extremely challenging, and as a result existing EDAs may become less attractive in large-scale problems because of the associated large computational requirements. Large-scale continuous global optimisation is key to many modern-day real-world problems. Scaling up EAs to large-scale problems has become one of the biggest challenges of the field. This paper pins down some fundamental roots of the problem and makes a start at developing a new and generic framework to yield effective and efficient EDA-type algorithms for large-scale continuous global optimisation problems. Our concept is to introduce an ensemble of random projections to low dimensions of the set of fittest search points as a basis for developing a new and generic divide-and-conquer methodology. Our ideas are rooted in the theory of random projections developed in theoretical computer science, and in developing and analysing our framework we exploit some recent results in nonasymptotic random matrix theory. MATLAB code is available from http://www.cs.bham.ac.uk/āˆ¼axk/rpm.zi
    • ā€¦
    corecore